EXPERT[E89,JMC] - www.SailDart.org

perm filename EXPERT[E89,JMC] blob sn#881343 filedate 1990-01-22 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00010 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	%expert[e89,jmc]		Expert Systems and Mathematical Logic (for IAKE)
C00016 00003	\section{What is Common Sense?}
C00024 00004		2. Here is the second way in which the common sense informatic
C00031 00005	\section{Areas of Common Sense Knowledge}
C00041 00006	\section{Common Sense and Mathematical Logic}
C00065 00007	% Put in corrections from glasgo.sli[e89,jmc] before reprinting this. Done
C00073 00008	\section{Acknowledgments}
C00074 00009	\section{References}
C00082 00010	\smallskip\centerline{Copyright \copyright\ 1989\ by John McCarthy}
C00083 ENDMK
C⊗;
%expert[e89,jmc]		Expert Systems and Mathematical Logic (for IAKE)
% common.tex[e83,jmc]
% thomas[f88,jmc],
% final printed version in thomas[s89,jmc]
\input memo.tex[let,jmc]
\title{Expert Systems and Mathematical Logic}

% Abstract added 1990 January
\noindent Abstract: An {\it expert system} is a computer program
intended to embody the knowledge and ability of an expert in a
certain domain.  Their
performance in their specialized domains are often very
impressive.  Nevertheless, hardly any of them have certain {\it
common sense} knowledge and ability possessed by any
non-feeble-minded human.  This lack makes them ``brittle''.  By
this is meant that they are difficult to extend beyond the scope
originally contemplated by their designers, and they usually
don't recognize their own limitations.  Many important
applications will require common sense abilities.  The object of
this lecture is to describe common sense abilities, the
problems that require them and an approach to providing them
based on formalizing common sense knowledge and reasoning in
mathematical logic.
\vfill\eject

%This page taken with slight modifications from common.tex[e83,jmc].
	An {\it expert system} is a computer program intended to
embody the knowledge and ability of an expert in a certain
domain.  Some of the ideas behind them and several examples have
been described in other lectures in this symposium.  Their
performance in their specialized domains are often very
impressive.  Nevertheless, hardly any of them have certain {\it
common sense} knowledge and ability possessed by any
non-feeble-minded human.  This lack makes them ``brittle''.  By
this is meant that they are difficult to extend beyond the scope
originally contemplated by their designers, and they usually
don't recognize their own limitations.  Many important
applications will require common sense abilities.  The object of
this lecture is to describe common sense abilities and the
problems that require them.

	Common sense facts and methods are only very partially
understood today, and extending this understanding is the key
problem facing artificial intelligence.

	This isn't exactly a new point of view.  I have been
advocating ``Computer Programs with Common Sense'' since I wrote
(McCarthy 1959).  Studying common sense
capability has sometimes been popular and sometimes unpopular
among AI researchers.  At present it's popular, perhaps because
new AI knowledge offers new hope of progress.  Certainly AI
researchers today know a lot more about what common sense is than
I knew in 1958 --- or in 1969 when I wrote another paper on the
subject.  However, expressing common sense knowledge in formal
terms has proved very difficult, and the number of scientists
working in the area is still far too small.

	One of the best known expert systems is
Mycin (Shortliffe 1976; Davis, \allowbreak Buchanan and Shortliffe 1977),
a program for advising physicians on treating bacterial
infections of the blood and meningitis.
It does reasonably well without common sense, provided
the user has common sense and understands the program's limitations.

	Mycin conducts a question and answer dialog.
After asking basic facts about the patient such
as name, sex and age, Mycin asks about suspected bacterial
organisms, suspected sites of infection, the presence of specific
symptoms (e.g. fever, headache) relevant to diagnosis, the outcome
of laboratory tests, and some others.  It then recommends a certain
course of antibiotics.  While the dialog
is in English, Mycin avoids having to understand freely written
English by controlling the dialog.  It outputs sentences, but
the user types only single words or standard phrases.  Its major
innovations over many previous expert systems were that it
uses measures of uncertainty (not probabilities) for its
diagnoses and the fact that it is prepared to explain its
reasoning to the physician, so he can decide whether to
accept it.

	Our discussion of Mycin begins with its {\it ontology}.
The ontology of a program is the set of entities that its
variables range over.  Essentially this is what it can have
information about.

	Mycin's ontology includes bacteria, symptoms, tests,
possible sites of infection, antibiotics and treatments.
Doctors, hospitals, illness and death are absent.  Even patients
are not really part of the ontology, although Mycin asks for many
facts about the specific patient.  This is because patients
aren't values of variables, and Mycin never compares the
infections of two different patients.  It would therefore be
difficult to modify Mycin to learn from its experience.

	Mycin's program, written in a general scheme called Emycin,
is a so-called {\it production system}.  A production system is a collection
of rules, each of which has two parts --- a pattern part and an
action part.  When a rule is activated, Mycin tests whether the
pattern part matches the database.  If so this results in the
variables in the pattern being matched to whatever entities are
required for the match of the database.  If not the pattern
fails and Mycin tries another.  If the match is successful, then
Mycin performs the action part of the pattern using the values
of the variables determined by the pattern part.
The whole process of questioning and recommending is built up
out of productions.

	The production formalism turned out to be suitable
for representing a large amount of information about the
diagnosis and treatment of bacterial infections.  When Mycin
is used in its intended manner it scores better than
medical students or interns or practicing physicians and
on a par with experts in bacterial diseases when the latter
are asked to perform in the same way.  However, Mycin
has not been put into production use, and the reasons given
by experts in the area varied when I asked whether it would
be appropriate to sell Mycin cassettes to doctors wanting
to put it on their micro-computers.
Some said it would be ok if there were a means of keeping
Mycin's database current with new discoveries in the field,
i.e. with new tests, new theories, new diagnoses and new
antibiotics.  For example, Mycin would have to be told
about Legionnaire's disease and the associated {\it Legionnella}
bacteria
 which became understood only
after Mycin was finished.  (Mycin is very stubborn about
new bacteria, and simply replies ``unrecognized response'')

	Others say that Mycin is not even close to usable except
experimentally, because
it doesn't know its own limitations.
  I suppose this is partly
a question of whether the doctor using Mycin is trusted
to understand the documentation about its limitations.
Programmer's always develop the idea that the
users of their programs are idiots, so the opinion
that doctors aren't smart enough not to be misled by Mycin's limitations
may be at least partly a consequence of this ideology.

	An example of Mycin not knowing its limitations can
be excited by telling Mycin that the patient has {\it Cholerae Vibrio}
in his intestines.  Mycin will cheerfully recommend two weeks of
tetracycline and nothing else.
Presumably this would indeed kill the bacteria, but
most likely the patient will be dead of cholera long before that.
However, the physician will presumably know that the diarrhea has
to be treated and look elsewhere for how to do it.

	On the other hand it may be really true that some measure
of common sense is required for usefulness even in this narrow
domain.  First we'll list some of the differences between
the common sense {\it informatic situation} and conventional
scientific theories.  Then
we'll list some areas of common sense knowledge
and reasoning ability and also apply the criteria to
Mycin and other hypothetical programs operating in Mycin's domain.
\section{What is Common Sense?}
%This page new.

	Our discussion of common sense involves two aspects.  First
we discuss what we call ``the common sense informatic situation'',
and second we discuss specific domains of common sense knowledge
and reasoning.

	The common sense informatic situation differs from that
of all kinds of formal scientific theories in at least three ways.

	1. The construction of an ordinary scientific theory
involves a decision as to what phenomena to take into account.
After that rules for the interaction of these phenomena are
decided on.  Often these rules take a mathematical form, but this
isn't the main distinction from common sense.  If new phenomena
become apparent, the theory must be modified from the outside.
The theory can't accept new phenomena within itself.

	The common sense informatic situation is open to
new phenomena.  Suppose on my way home on my bicycle I
unexpectedly encounter a herd of sheep on the road.  This
has never happened to me on the Stanford University Campus
and probably never will.  Maybe I will take another road,
maybe I will ride or walk my bicycle through the herd, and
maybe I will communicate with the sheep herder, policeman
or animal control officer about when the sheep will be out
of the way.  A formal model of bicycling on the Stanford
Campus, e.g. for a robot bicyclist would not take sheep
into account.  Instead it would have to do as I would do---
appeal to general common sense knowledge of animal and
human behavior and the institutions of police.

	The closest branch science or engineering to AI is
operations research, because it undertakes to study problems from
any area of human endeavor and determine optimal behavior.  Lets
compare it with common sense knowledge and ability.  One of its
first applications, during World War II, was optimizing the
American and British airplane search for German submarines.  The
analysis and the subsequent change in search strategy
substantially increased the number destroyed.  The analysis took
into account when and where submarines were most likely to
surface and also facts about at what distances and from what
altitudes they could be detected under various weather
conditions.  The methodology involved the operations researcher
deciding what facts to take into account and making a
mathematical model.  Once this was decided the strategy was
determined.  The strategy itself could not take new phenomena
into account.  If a new phenomenon was noticed by pilots, they
would have to go back to the researchers to construct a new
model.

	Suppose, for example, the Germans had found some way to
use some kind of fake submarine as a decoy that would cause the
aircraft to reveal their presence by shooting at it.  Once the
pilots noticed this, they would use their common sense to try
to minimize the effect of the decoys, and the operations researchers
would use their common sense to decide how to modify the mathematical
model to take the decoys into account.

	A robot submarine chaser with common sense would have to
be able to take decoys and other new phenomena into account.

	Some people have strong intuitions that this is impossible,
and this leads them to believe that AI is impossible.

	Formalization of taking new phenomena into account has
proceded slowly.  In the section on mathematical logic, we give
an example of what can be done.
	2. Here is the second way in which the common sense informatic
situation differs from that of a person or machine reasoning {\it within}
a formal scientific theory of the present type.  In both cases the
premisses used as a basis for the reasoning will be context dependent.
However, ordinary scientific theories don't take this into account.  If
the assumptions of the theory are to be transcended, this depends on
the common sense of the user.  Formalizing this aspect of common sense
is much less advanced.  However, it is possible to formalize contexts
and regard them as objects and discuss the relations of narrower and
broader contexts.  Some suggestions for this are in (McCarthy 1987).
It is also necessary to consider ``approximate theories'' which are
discussed in (McCarthy 1979).

	3. Formal scientific theories usually are intended for deriving
general laws.  They often are {\it epistemologically inadequate} for
expressing the information that can actually be obtained.  For example,
the Navier-Stokes equations govern the flow of water spilling from a
glass, but when a glass is spilled, neither a person nor a robot will
have the initial conditions that would enable (if only the robot could
compute fast enough) one to get results that would be helpful in avoiding
getting wet.  Instead humans have common sense physics that enables us
to get out of the way.  For robots, we will have to formalize this
common sense physics.

\section{Areas of Common Sense Knowledge}

	Now we discuss various areas of common sense knowledge---again
referring to Mycin.

	1. The most salient common sense knowledge concerns
situations that change in time as a result of events.  The most
important events are actions, and for a program to plan intelligently,
it must be able to determine the effects of its own actions.

	Consider the Mycin domain as an example.  The situation with which
Mycin deals includes the doctor,
the patient and the illness.  Since Mycin's actions are
advice to the doctor, full planning would
have to include information about the effects of Mycin's output on what
the doctor will do.  Since Mycin doesn't know about the doctor, it might
plan the effects of the course of treatment on the patient.  However, it
doesn't do this either.  Its rules give the recommended treatment as a function
of the information elicited about the patient, but Mycin makes no
prognosis of the effects of the treatment.  Of course, the doctors who
provided the information built into Mycin considered the effects of the
treatments.

	Ignoring prognosis
is possible because of the specific narrow domain in which
Mycin operates.  Suppose, for example, a certain antibiotic had
the precondition for its usefulness that the patient not have a fever.
Then Mycin might have to make a plan for getting rid of the patient's
fever and verifying that it was gone as a part of the plan for using
the antibiotic.  In other domains, expert systems and other
AI programs have to make plans, but Mycin doesn't.  Perhaps if I knew
more about bacterial diseases, I would conclude that their treatment
sometimes really does require planning and that lack of planning
ability limits Mycin's utility.

	The fact that Mycin doesn't give a prognosis is certainly
a limitation.  For example, Mycin cannot be asked on behalf of the
patient or the administration of the hospital when the patient is
likely to be ready to go home.  The doctor who uses Mycin must do
that part of the work himself.  Moreover, Mycin cannot answer a
question about a hypothetical treatment, e.g. ``What will happen
if I give this patient penicillin?'' or even ``What bad things might
happen if I give this patient penicillin?''.

	2. Various formalisms are used in artificial intelligence
for representing facts about the effects of actions and other
events.  However, all systems that
I know about give the effects of an event in a situation by
describing a new situation that results from the event.
This is often enough, but it doesn't cover the important case
of concurrent events and actions.  For example, if a patient
has cholera, while the antibiotic is killing the cholera bacteria,
the damage to his intestines is causing loss of fluids that are
likely to be fatal.  Inventing a formalism that will conveniently
express people's common sense knowledge about concurrent events
is a major unsolved problem of AI.

	3. The world is extended in space and is occupied by objects
that change their positions and are sometimes created and destroyed.
The common sense facts about this are difficult to express but
are probably not important in the Mycin example.  A major difficulty
is in handling the kind of partial knowledge people ordinarily have.
I can see part of the front of a person in the audience, and my
idea of his shape uses this information to approximate his total shape.
Thus I don't expect him to stick out two feet in back even though
I can't see that he doesn't.  However, my idea of the shape of his
back is less definite than that of the parts I can see.

	4. The ability to represent and use knowledge about knowledge
is often required for intelligent behavior.
What airline flights there are to Singapore is recorded in the
issue of the International Airline Guide current for
the proposed flight
day.  Travel agents know how to book airline flights and can
compute what they cost.  An advanced Mycin might need to reason that
Dr. Smith knows about cholera, because
he is a specialist in tropical medicine.

	5. A program that must co-operate or compete with people
or other programs must be able to represent information about
their knowledge, beliefs, goals, likes and dislikes, intentions
and abilities.
An advanced Mycin might need to know that a patient won't take a
bad tasting medicine unless he is convinced of its necessity.

	6. Common sense includes much knowledge whose domain overlaps
that of the exact sciences but differs from it epistemologically.
For example,
if I spill the glass of water on the podium, everyone knows that
the glass will break and the water will spill.  Everyone knows that
this will take a fraction of a second and that the water will not
splash even ten feet.  However, this information is not obtained by
using the formula for a falling body or the Navier-Stokes equations
governing fluid flow.  We don't have the input data for the equations,
most of us don't know them, and we couldn't integrate them fast
enough to decide whether to jump out of the way.  This common
sense physics is contiguous with scientific physics.  In fact
scientific physics is imbedded in common sense physics, because
it is common sense physics that tells us what the equation
%
	$$s = {1\over 2} g t↑2$$
%
means.
If Mycin were extended to be a robot physician it would have to know
common sense physics and maybe also some scientific physics.

	It is doubtful that the facts of the common sense world can
be represented adequately by production rules.  Consider the fact that when two
objects collide they often make a noise.  This fact can be used to make
a noise, to avoid making a noise, to explain a noise or to explain the
absence of a noise.  It can also be used in specific situations involving
a noise but also to understand general phenomena, e.g. should an intruder
step on the gravel, the dog will hear it and bark.  A production rule
embodies a fact only as part of a specific procedure.
Typically they match facts about specific objects, e.g. a specific
bacterium, against a general rule and get a new fact about those objects.
\section{Common Sense and Mathematical Logic}

	If a computer is to store facts about the world and reason
with them, it needs a precise language, and the program has to embody
a precise idea of what reasoning is allowed, i.e. of how new formulas
may be derived from old.  Therefore, it was natural to try to use
mathematical logical languages to express what an intelligent computer
program knows that is relevant to the problems we want it to solve and
to make the program use logical inference in order to decide what to
do.  (McCarthy 1959) contains the first proposals to use logic in AI
for expressing what a program knows and how it should reason.
(Proving logical formulas as a domain for AI had already been
studied by several authors).

	The 1959 paper said:

\begingroup\narrower\narrower
% COMMON.TEX[E80,JMC] TeX version Programs with Common Sense
%
The {\it advice taker} is a proposed program for solving problems by
manipulating sentences in formal languages.  The main difference
between it and other programs or proposed programs for manipulating
formal languages (the {\it Logic Theory Machine} of Newell, Simon and
Shaw and the Geometry Program of Gelernter) is that in the previous
programs the formal system was the subject matter but the heuristics
were all embodied in the program.  In this program the procedures will
be described as much as possible in the language itself and, in
particular, the heuristics are all so described.

	The main advantages we expect the {\it advice taker} to have
is that its behavior will be improvable merely by making statements to
it, telling it about its symbolic environment and what is wanted from
it.  To make these statements will require little if any knowledge of
the program or the previous knowledge of the {\it advice taker}.  One
will be able to assume that the {\it advice taker} will have available
to it a fairly wide class of immediate logical consequences of anything
it is told and its previous knowledge.  This property is expected to
have much in common with what makes us describe certain humans as
having {\it common sense}.  We shall therefore say that {\it a program
has common sense if it automatically deduces for itself a sufficiently
wide class of immediate consequences of anything it is told and what
it already knows.}
\par\endgroup

	The main reasons for using logical sentences extensively in AI
are better understood by researchers today than in 1959.  Expressing
information in declarative sentences is far more modular than
expressing it in segments of computer program or in tables.  Sentences
can be true in much wider contexts than specific programs can be
useful.  The supplier of a fact does not have to understand much about
how the receiver functions, or how or whether the receiver will use it.
The same fact can be used for many purposes, because the logical
consequences of collections of facts can be available.

	The {\it advice taker} prospectus was ambitious in 1959, would
be considered ambitious today and is still far from being immediately
realizable.  This is especially true of the goal of expressing the
heuristics guiding the search for a way to achieve the goal in the
language itself.  The rest of this paper is largely concerned with
describing what progress has been made, what the obstacles are, and
how the prospectus has been modified in the light of what has been
discovered.

	The formalisms of logic have been used to differing
extents in AI.  Most of the uses are much less ambitious than
the proposals of (McCarthy 1959).  We can distinguish four
levels of use of logic.

	1. A machine may use no logical sentences---all its
``beliefs'' being implicit in its state.  Nevertheless, it is often
appropriate to ascribe beliefs and goals to the program, i.e. to
remove the above sanitary quotes, and to use a principle of
rationality---{\it It does what it thinks will achieve its goals}.
Such ascription is discussed from somewhat different points of view
 in (Dennett 1971), (McCarthy 1979) and
(Newell 1981).  The advantage is that the intent of the machine's
designers and the way it can be expected to behave may be more readily
described {\it intentionally} than by a purely physical description.

	The relation between the physical and the {\it intentional}
descriptions is most readily understood in simple systems that admit
readily understood descriptions of both kinds, e.g. thermostats.  Some
finicky philosophers object to this, contending that unless a system
has a full human mind, it shouldn't be regarded as having any mental
qualities at all.  This is like omitting the numbers 0 and 1 from the
number system on the grounds that numbers aren't required to count
sets with no elements or one element.
Indeed if your main interest is the null set or unit sets, numbers
{\it are} irrelevant.  However, if your interest is the number system
you lose clarity and uniformity
if you omit 0 and 1.  Likewise, when one studies phenomena like belief,
e.g. because one wants a machine with beliefs and which reasons about
beliefs, it works better not to exclude simple cases from the formalism.
One battle has been over whether it should be forbidden to ascribe to a simple
thermostat the belief that the room is too cold.
(McCarthy 1979) says much more about ascribing mental qualities
to machines, but that's not where the main action is in AI.

	2. The next level of use of logic involves computer programs
that use sentences in machine memory to represent their beliefs but
use other rules than ordinary logical inference to reach conclusions.
New sentences are often obtained from the old ones by ad hoc programs.
Moreover, the sentences that appear in memory belong to a
program-dependent subset of the logical language being used.  Adding
certain true sentences in the language may even spoil the functioning
of the program.  The languages used are often rather unexpressive
compared to first order logic, for example they may not admit
quantified sentences, or they may use a
different notation from that used for ordinary facts to represent
``rules'', i.e.  certain universally quantified implication sentences.
Most often, conditional rules are used in just one
direction, i.e. contrapositive reasoning is not used.  
Usually the program cannot infer new rules; rules
must have all been put in by the ``knowledge engineer''.  Sometimes
programs have this form through mere ignorance, but the usual
reason for the restriction is the practical desire to make the program
run fast and deduce just the kinds of conclusions its designer
anticipates.
  We
believe the need for such specialized inference will turn out to be
temporary and will be reduced or eliminated by improved ways of
controlling general inference, e.g. by allowing the heuristic rules to
be also expressed as sentences as promised in the above extract from
the 1959 paper.

	3. The third level uses first order logic and also logical
deduction.  Typically the sentences are represented as clauses, and the
deduction methods are based on J. Allen Robinson's (1965) method of
resolution.  It is common to use a theorem prover as a problem solver,
i.e.  to determine an $x$ such that $P(x)$ as a byproduct of a proof of
the formula $\exists xP(x)$.
This level is less used for practical
purposes than level two, because techniques for controlling the
reasoning are still insufficiently developed, and it is common for the
program to generate many useless conclusions before reaching the desired
solution.  Indeed, unsuccessful experience (Green 1969) with this method
led to more restricted uses of logic, e.g. the STRIPS system of (Fikes and Nilsson
1971).
%The promise of (McCarthy 1959) to express the
%heuristic facts that should be used to guide the search as logical
%sentences has not yet been realized by anyone.

	The commercial ``expert system shells'', e.g. ART, KEE and
OPS-5, use logical representation of facts, usually ground facts only,
and separate facts from rules.  They provide elaborate but not always
adequate ways of controlling inference.

	In this connection it is important to mention logic programming,
first introduced in Microplanner (Sussman et al., 1971) 
and from different points of view by Robert Kowalski (1979) and Alain
Colmerauer in the early 1970s.
A recent text is (Sterling and Shapiro 1986).  Microplanner
was a rather unsystematic collection of tools, whereas Prolog relies
almost entirely on one kind of logic programming, but the main idea
is the same.  If one uses a restricted class of sentences, the so-called
Horn clauses, then it is possible to use a restricted form of logical
deduction.  The control problem is then much eased, and it is possible
for the programmer to anticipate the course the deduction will take.
The price paid is that only certain kinds of facts are conveniently
expressed as Horn clauses, and the depth first search built into
Prolog is not always appropriate for the problem.


	Even when the relevant facts can be expressed as Horn
clauses supplemented by negation as failure, the reasoning
carried out by a Prolog program may not be appropriate.  For
example, the fact that a sealed container is sterile if all the
bacteria in it are dead and the fact that heating a can kills a
bacterium in the can are both expressible as Prolog clauses.
However, the resulting program for sterilizing a container will
kill each bacterium individually, because it will have to index
over the bacteria.  It won't reason that heating the can kills
all the bacteria at once, because it doesn't do universal
generalization.

	Although  third level systems express both facts and rules
as logical sentences, they are still rather specialized.  The axioms
with which the programs begin are not general truths about the world
but are sentences whose meaning and truth is limited to the narrow
domain in which the program has to act.  For this reason, the ``facts''
of one program usually cannot be used in a database for other programs.

	4. The fourth level is still a goal.  It involves representing
general facts about the world as logical sentences.  Once put in
a database, the facts can be used by any program.  The facts would
have the neutrality of purpose characteristic of much human information.
The supplier of information would not have to understand
the goals of the potential user or how his mind works.  The present
ways of ``teaching'' computer programs by modifying them or
directly modifying their databases amount to ``education
by brain surgery''.

	A key problem for achieving the fourth level is to develop
a language for a general common sense database.  This is difficult,
because the {\it common sense informatic situation} is complex.
Here is a preliminary list of features and
considerations.

	1. Entities of interest are known only partially, and the
information about entities and their relations that may be relevant
to achieving goals cannot be permanently separated from irrelevant
information.  
%
(Contrast this with the situation in gravitational
astronomy in which it is stated in the informal introduction to
a lecture or textbook that
the chemical composition and shape of a body are irrelevant to the
theory; all that counts is the body's mass, and its initial position
and velocity).

	Even within gravitational astronomy, non-equational theories arise
and relevant information may be difficult to determine.  For example, it was
recently proposed that periodic extinctions discovered in the
paleontological record are caused by showers of comets induced by a
companion star to the sun that encounters and disrupts the Oort cloud of
comets every time it comes to perihelion.  This theory is qualitative
because neither the orbit of the hypothetical star nor those of the comets
is available.

	2. The formalism has to be {\it epistemologically adequate},
a notion introduced in (McCarthy and Hayes 1969).  This means that
the formalism must be capable of representing the information that
is actually available, not merely capable of representing actual
complete states of affairs.

	For example, it is insufficient to have a formalism that
can represent the positions and velocities of the particles in a
gas.  We can't obtain that information, our largest computers don't
have the memory to store it even if it were available, and our
fastest computers couldn't use the information to make predictions even
if we could store it.

\section{Formalized Nonmonotonic Reasoning}

	It seems that fourth level systems require extensions
to mathematical logic.  One kind of extension is formalized {\it nonmonotonic
reasoning}, first proposed in the late 1970s (McCarthy 1977, 1980, 1986),
(Reiter 1980), (McDermott and Doyle 1980), (Lifschitz 1989).
Mathematical logic has been monotonic
in the following sense.  If we have $A \vdash p$ and $A ⊂ B$, then we also
have $B \vdash p$.

	If the inference is logical deduction, then exactly the same
proof that proves $p$ from $A$ will serve as a proof from $B$. If the
inference is model-theoretic, i.e.  $p$ is true in all models of $A$,
then $p$ will be true in all models of $B$, because the models of $B$
will be a subset of the models of $A$.  So we see that the monotonic
character of traditional logic doesn't depend on the details of the
logical system but is quite fundamental.

	While much human reasoning is monotonic,
some important human common sense reasoning is not.  We
reach conclusions from certain premisses that we would not reach if
certain other sentences were included in our premisses.  For example,
if I hire you to build me a bird cage, you conclude that it is appropriate
to put a top on it, but when you learn the further
fact that my bird is a penguin  you no longer draw that
conclusion.  Some people think it is possible to try to save
monotonicity by saying that what was in your mind was not a general rule
about birds flying but a probabilistic rule.  So
far these people have not worked out any detailed
epistemology for this approach, i.e.  exactly what probabilistic
sentences should be used.  Instead AI has moved to directly formalizing
nonmonotonic logical reasoning.  Indeed it seems to me that
when probabilistic reasoning (and not just the axiomatic
basis of probability theory) has been fully formalized, it will
be formally nonmonotonic.

	Nonmonotonic reasoning is an active field of study.
Progress is often driven by examples, e.g. the Yale shooting
problem (Hanks and McDermott 1986), in which obvious
axiomatizations used with the available reasoning formalisms
don't seem to give the answers intuition suggests.  One direction
being explored (Moore 1985, Gelfond 1987, Lifschitz 1989)
involves putting facts about belief and knowledge explicitly in
the axioms---even when the axioms concern nonmental domains.
Moore's classical example (now 4 years old) is ``If I had an elder
brother I'd know it.''
% Put in corrections from glasgo.sli[e89,jmc] before reprinting this. Done
\section{Example of Formalization of Overcoming an Unexpected Obstacle}

	We conclude with an example of a nonmonotonic formalism
and its use.  We discuss how an unexpected obstacle vitiates
the inference that the usual sequence of actions will achieve
a goal.  Then, without changing any other premiss, we show how
inserting a suitable action in the sequence achieves the goal.

1. We use a general formalism for describing the effects of actions.
It is a variant due to Vladimir Lifschitz (1987) of the
situation calculus (McCarthy and Hayes 1969).

2. Specific facts concerning travel by airplane from one city
to another are given.  The need for a flight to exist and to have
a ticket are made explicit preconditions.

3. Facts relevant for flying from Glasgow to Moscow via London
are mentioned, i.e. the flights are mentioned.

4. The circumscription formalism of (McCarthy 1980) and
(McCarthy 1986) is used to minimize certain predicates, i.e.
$precond$, $noninertial$, $causes$, $occurs$.

5.  It can then be inferred (nonmonotonically) that flying
from Glasgow to London and then flying to Moscow results in being
in Moscow.

6. Facts giving the consequences of losing a ticket and
buying a ticket are included.  They affect the previous inference.

7. An assertion that the ticket is lost in London is then added
to the previous facts.  Now it can no longer be inferred that the
previous plan succeeds.  However, it can be inferred that the
plan of flying to London, then buying a ticket and then flying to
Moscow does succeed.

	This example shows that it is possible to make a formalism
that (1) can be used to infer that a certain plan will succeed, (2)
can no longer infer that the plan will succeed when an obstacle
is asserted, (3) can be used to infer that a different plan that
overcomes the obstacle will succeed.

	Some domains in which it is hoped to use expert systems
require this capability.

	Here are the formulas.
% from GLASGO.SLI[E89,JMC]/2P/59L with added comments
{% suppresses vertical bars
\overfullrule=0pt
%
$$holds(not p,s) ≡ ¬holds(p,s)$$
%
This relates the operator $not$ as applied to fluents to logical
negation.
%
$$succeeds(a,s) ≡ (∀p)(precond(p,a) ⊃ holds(p,s)).$$
%
This tells us that an action {\it succeeds} in a situation $s$
if all its preconditions hold in the situation.  Actually, it's
a definition of the predicate $succeeds$.
%
$$succeeds(a,s) ∧ causes(a,p) ⊃ holds(p,result(a,s)).$$
%
If an action succeeds in a situation and it is one that causes a
fluent to hold, then the fluent holds in the situation that
results from the preformance of the action.
%
$$¬noninertial(p,a) ∧ holds(p,s) ⊃ holds(p,result(a,s))$$
%
This tells us that unless an action affects a fluent, then
the fluent holds after the action if it held before the action.
%
$$occurs(e,s) ⊃ outcome s = outcome result(e,s)$$
%
This and the next axiom give the effects of events different from actions.
%
$$∀e¬occurs(e,s) ⊃ outcome s = s$$
%
$$rr(a,s) = outcome result(e,s)$$
%
This is an abbreviation for the situation that results from an action after
all the events that occur after it have happened.
%
$$causes(fly(x,y),at y)$$
%
This is the first axiom specifically about the effects of flying.  It says
that flying from $x$ to $y$ causes being at $y$.
%
$$precond(at x,fly(x,y))$$
%
You must be at $x$ to fly from there to $y$.
%
$$precond(hasticket,fly(x,y))$$
%
Also you must have a ticket.
%
$$precond(existsflight(x,y),fly(x,y))$$
%
And there must be a flight.
%
$$causes(loseticket, not hasticket)$$
%
The effect of losing a ticket.
%
$$causes(buyticket,hasticket)$$
%
The effect of buying a ticket.
%
$$holds(at Glasgow,S0)$$
%
This is the first fact about the initial situation $S0$.  The
traveller is at Glasgow.
%
$$holds(hasticket,S0)$$
%
He has a ticket in $S0$
%
$$holds(existsflight(Glasgow,London),S0)$$
%
$$holds(existsflight(London,Moscow),S0)$$
%
The necessary flights exist.
%
$$circum(Facts;causes,precond,noninertial,occurs;holds)$$
%
This is the circumscription that is done with the conjunction
(called $Facts$) of these axioms.  Understanding this may
require reading (McCarthy 1986), and (Lifschitz 1987) would
also help.
%
Once the circumscription has been done, we can show
%
$$\eqalign{holds(at Moscow,rr&(fly(London,Moscow),\cr
&rr(fly(Glasgow,London),S0))),\cr}$$
%
but not if we add
%
$$occurs(loseticket,result(fly(Glasgow,London),S0)).$$
%
However, in this case we can show
%
$$\eqalign{holds(at Moscow,rr&(fly(London,Moscow),\cr
&rr(buyticket,\cr
&\ \ rr(fly(Glasgow,London),S0)))).\cr}$$
\section{Acknowledgments}

	This article is mainly adapted for an audience of
knowledge engineers from (McCarthy 1983) and (McCarthy 1989).
The last section is entirely new.
\section{References}

\noindent {\bf Davis, Randall; Buchanan, Bruce; and Shortliffe,
Edward (1977)}:  ``Production Rules as a Representation for a
Knowledge-Based Consultation Program,'' {\it Artificial
Intelligence}, Volume 8, Number 1, February.

\noindent {\bf Dennett, D.C. (1971)}:
 ``Intentional Systems'', {\it Journal of Philosophy}
vol. 68, No. 4, Feb. 25.

\noindent
{\bf Fikes, R, and Nils Nilsson, (1971)}:
``STRIPS: A New Approach to the Application of 
Theorem Proving to Problem Solving,'' {\it Artificial Intelligence}, Volume 2,
Numbers 3,4, January,
pp. 189-208.

\noindent
{\bf Gelfond, M. (1987)}: ``On Stratified Autoepistemic Theories'',
 {\it AAAI-87} {\bf 1}, 207-211.

\noindent
{\bf Green, C., (1969)}:
``Application of Theorem Proving to Problem Solving.'' In IJCAI-1, pp. 219-239.

\noindent
{\bf Hanks, S. and D. McDermott (1986)}: ``Default Reasoning, Nonmonotonic
Logics, and the Frame Problem'', in AAAI-86, pp. 328-333.

\noindent
{\bf Kowalski, Robert (1979)}: {\it Logic for Problem Solving},
North-Holland, Amsterdam.

\noindent
{\bf Lifschitz, Vladimir (1987)}:
``Formal theories of action'', in: {\it The Frame Problem in Artificial
Intelligence, Proceedings of the 1987 Workshop}, 1987.
% FTA.TEX[ARC,VAL]

\noindent
{\bf Lifschitz, Vladimir (1989)}: {\it Between Circumscription and
Autoepistemic Logic}, to appear in the Proceedings of the First
International Conference on Principles of Knowledge Representation
and Reasoning, Morgan-Kaufman.

\noindent {\bf McCarthy, John (1959)}: ``Programs with Common Sense'', in {\it
Proceedings of the Teddington Conference on the Mechanization of
Thought Processes}, Her Majesty's Stationery Office, London.
%  common[e80,jmc],
% common.tex[e80,jmc]

\noindent
{\bf McCarthy, John and P.J. Hayes (1969)}:  ``Some Philosophical Problems from
the Standpoint of Artificial Intelligence'', in D. Michie (ed), {\it Machine
Intelligence 4}, American Elsevier, New York, NY.
%  phil.tex[ess,jmc] with slight modifications

\noindent
{\bf McCarthy, John (1977)}:
``Epistemological Problems of Artificial Intelligence'', {\it Proceedings
of the Fifth International Joint Conference on Artificial 
Intelligence}, M.I.T., Cambridge, Mass.
%  ijcai.c[e77,jmc]

\noindent {\bf McCarthy, John (1979)}:
``Ascribing Mental Qualities to Machines'' in {\it Philosophical Perspectives 
in Artificial Intelligence}, Ringle, Martin (ed.), Harvester Press, July 1979.
%  .<<aim 326, MENTAL[F76,JMC],
% mental.tex[f76,jmc]>>

\noindent
{\bf McCarthy, John (1980)}: 
``Circumscription - A Form of Non-Monotonic Reasoning'', {\it Artificial
Intelligence}, Volume 13, Numbers 1,2, April.
%  .<<aim 334, circum.new[s79,jmc], cirnew.tex[s79,jmc]>>

\noindent
{\bf McCarthy, John (1983)}: ``Some Expert Systems Need Common Sense'',
in {\it Computer Culture: The Scientific, Intellectual and Social Impact
of the Computer}, Heinz Pagels, ed.
 vol. 426, Annals of the New York Academy of Sciences.
%paper
%presented at New York Academy of Sciences Symposium.
%  common[e83,jmc]
%common.tex[e83,jmc]

\noindent
{\bf McCarthy, John (1986)}:
``Applications of Circumscription to Formalizing Common Sense Knowledge''
{\it Artificial Intelligence}, April 1986
%  circum.tex[f83,jmc]

\noindent {\bf McCarthy, John (1987)}:
``Generality in Artificial Intelligence'', {\it Communications of the ACM}.
Vol. 30, No. 12, pp. 1030-1035
% genera[w86,jmc]

\noindent {\bf McCarthy, John (1989)}:
``Artificial Intelligence, Logic and Formalizing Common Sense'',
in Thomason, Richmond, ed. {\it Philosophical Logic
and Artificial Intelligence}, Kluwer Academic Publishing,
Dordrecht, Netherlands.
% thomas[f88,jmc],  final printed version in thomas[s89,jmc]

\noindent
{\bf McDermott, D. and J. Doyle, (1980)}:
``Non-Monotonic Logic I,'' {\it Artificial Intelligence\/},
Vol. 13, N. 1

\noindent
{\bf Moore, R. (1985)}: ``Semantical Considerations on Nonmonotonic Logic'',
 {\it Artificial Intelligence} {\bf 25} (1), 75-94.

\noindent
{\bf Newell, Allen (1981)}: ``The Knowledge Level,'' {\it AI Magazine\/},
Vol. 2, No. 2

\noindent
{\bf Reiter, Raymond (1980)}: ``A Logic for Default Reasoning'', {\it Artificial
Intelligence}, Volume 13, Numbers 1,2, April.

\noindent
{\bf Robinson, J. Allen (1965)}: ``A Machine-oriented Logic Based
on the Resolution Principle''. {\it JACM}, 12(1), 23-41.

\noindent
{\bf Shortliffe, Edward H. (1976)}:
Computer-Based Medical Consultations: \allowbreak MYCIN, American Elsevier, 
New York, NY.

\noindent
{\bf Sterling, Leon and Ehud Shapiro (1986)}: {\it The Art of Prolog}, MIT Press.

\noindent
{\bf Sussman, Gerald J., Terry Winograd, and 
Eugene Charniak (1971)}: ``Micro-planner Reference Manual,'' Report AIM-203A,
Artificial Intelligence Laboratory, Massachusetts Institute of Technology,
Cambridge.
\smallskip\centerline{Copyright \copyright\ 1989\ by John McCarthy}
\smallskip\noindent{This draft of EXPERT[E89,JMC]\ TEXed on \jmcdate\ at \theTime}
%File originated on 19-Aug-89
\vfill\eject\end